Take-home Exercise 3

Interactivity in Visual Analytics

Antonius Handy https://www.linkedin.com/in/antoniushandy (Singapore Management University, Master of IT in Business)https://scis.smu.edu.sg/master-it-business
2022-05-15

1. Overview

In this take-home exercise 3, we are going to reveal the economic of the city of Engagement, Ohio USA by using appropriate static and interactive statistical graphics methods. We will show how the financial health of the residents changes over the period of time and how the wages are compared against overall cost of living.

The data is processed by using appropriate tidyverse family of packages, whereas the statistical graphics are prepared using ggplot2 and its extensions.

2. Getting Started

Before we get started, it is important for us to ensure that the required R packages have been installed. If yes, we will load the R packages. If they have yet to be installed, we will install the R packages and load them onto R environment.

The packages required for this exercise are tidyverse, ggiraph, plotly, gganimate, zoo, trelliscopejs, gifski and gapminder.

packages = c('tidyverse','ggiraph','plotly','gganimate','zoo','trelliscopejs','gifski','gapminder')

for(p in packages){
  if(!require(p, character.only = T)){
    install.packages(p)
  }
  library(p, character.only = T)
}

3. Data

3.1 Data Source

The original dataset was obtained from VAST Challenge 2022 in csv format. It shows the financial report of 1011 residents of Engagement, OH that have agreed to participate in this study from March 2022 until May 2023.

3.2 Importing Data

The code chunk below imports a dataset called FinancialJournal.csv into R by using read_csv() of readr and saves it as tibble data frame called financial. It consists of 1,856,330 records as shown below.

financial <- read_csv("data/FinancialJournal.csv")
head(financial)
# A tibble: 6 x 4
  participantId timestamp           category  amount
          <dbl> <dttm>              <chr>      <dbl>
1             0 2022-03-01 00:00:00 Wage      2473. 
2             0 2022-03-01 00:00:00 Shelter   -555. 
3             0 2022-03-01 00:00:00 Education  -38.0
4             1 2022-03-01 00:00:00 Wage      2047. 
5             1 2022-03-01 00:00:00 Shelter   -555. 
6             1 2022-03-01 00:00:00 Education  -38.0

3.3 Data Wrangling

As we are not interested in time, we will only pick up the month and year by using as.yearmon() of zoo and rename the new column as yearmonth.

The financial journal consists of 6 categories namely Education, Food, Recreation, Rent Adjustment, Shelter and Wage. We are going to sum them up and group them by participant ID, yearmonth and category.

financial <- financial %>%
  mutate(yearmonth = as.yearmon(timestamp)) %>%
  group_by(participantId, yearmonth, category) %>%
  summarise(Total = sum(amount))
head(financial)
# A tibble: 6 x 4
# Groups:   participantId, yearmonth [2]
  participantId yearmonth category     Total
          <dbl> <yearmon> <chr>        <dbl>
1             0 Mar 2022  Education    -76.0
2             0 Mar 2022  Food        -268. 
3             0 Mar 2022  Recreation  -349. 
4             0 Mar 2022  Shelter    -1110. 
5             0 Mar 2022  Wage       11932. 
6             0 Apr 2022  Education    -38.0

In this code chunk below, we are going to create a new table called expenses to reveal the living cost of the residents by excluding Wage and Rent Adjustment from category column.

expenses <- financial %>%
  filter(category != 'Wage') %>%
  filter(category != 'RentAdjustment')

Next, we are going to use pivot_wider() of dplyr to create new columns based on the amount of each category, and rename the new table as fin_wide.

We will also strip all NA values by specifying na.rm = TRUE and calculate the living cost, revenue and savings.

Living cost = Education + Food + Recreation + Shelter

Revenue = Wage + Rent Adjustment

Savings = Revenue - Living cost

fin_wide <- financial %>%
  pivot_wider(names_from = "category",
              values_from = "Total") %>%
  rowwise() %>%
  mutate(LivingCost = sum(c_across(c(1:4)),na.rm=T)) %>%
  mutate(Revenue = sum(c_across(c(5:6)),na.rm=T)) %>%
  mutate(Savings = sum(c_across(c(7:8)),na.rm=T))
head(fin_wide)
# A tibble: 6 x 11
# Rowwise:  participantId, yearmonth
  participantId yearmonth Education  Food Recreation Shelter   Wage
          <dbl> <yearmon>     <dbl> <dbl>      <dbl>   <dbl>  <dbl>
1             0 Mar 2022      -76.0 -268.      -349.  -1110. 11932.
2             0 Apr 2022      -38.0 -266.      -219.   -555.  8637.
3             0 May 2022      -38.0 -265.      -383.   -555.  9048.
4             0 Jun 2022      -38.0 -257.      -466.   -555.  9048.
5             0 Jul 2022      -38.0 -270.     -1070.   -555.  8637.
6             0 Aug 2022      -38.0 -262.      -314.   -555.  9459.
# ... with 4 more variables: RentAdjustment <dbl>, LivingCost <dbl>,
#   Revenue <dbl>, Savings <dbl>

4. Economic Visualisation

Financial Health of Residents

Trellis display below shows how monthly savings of each participant change from March 2022 to May 2023.

qplot(yearmonth, Savings, data = fin_wide) +
  labs(x = "Year & Month",
       y = "Savings") +
  theme_bw() +
  facet_trelliscope(~ participantId,
                    nrow = 2, ncol = 5,
                    width = 450,
                    path = "trellis/",
                    self_contained = TRUE)

Overview of Living Costs and Revenue over Time

From the graph below, it is noticed that there is a major shift from the top right hand corner towards the bottom left hand corner. In general, both revenue and living costs are getting lower.

ggplot(fin_wide %>% mutate(yearmonth = as.numeric(yearmonth)),
       aes(x = Revenue, y = LivingCost*-1, 
           size = Revenue, 
           colour = participantId)) +
  geom_point(alpha = 0.5, 
             show.legend = FALSE) +
  scale_color_gradientn(colours = rainbow(100)) +
  scale_size(range = c(1, 6)) +
  labs(title = 'Year: {frame_time}', 
       x = 'Revenue ($)',
       y = 'Living Costs ($)') +
  transition_time(yearmonth) +
  ease_aes('linear')

Monthly Expenses Distribution

On average, residents allocate most of their monthly revenue for shelter, followed by recreation, food and lastly education.

p <- ggplot(data = expenses,
       aes(x = category, y = Total*-1))+
  geom_boxplot() +
  xlab("Category") +
  ylab("Amount ($)") +
  ggtitle("Average Monthly Expenses for Each Resident")
ggplotly(p)

Changes in Living Costs and Wage in More Detail

Let us now examine how wages and each daily expense other than shelter change from March 2022 to May 2023 in more detail. While the wages are shrinking, it is also found that the education and recreation cost are getting lower in May 2023.

If we look further into the second graph, we could also see that in March 2022 there are 2 distinct groups of participants. One group has relatively lower wages and lower food expenses, while the other group has higher wages as well as higher food expenses. However, in May 2023 they become homogeneous.

fin_mar2022 <- fin_wide %>%
  filter(yearmonth == 'Mar 2022') %>%
  mutate(Education = Education * -1) %>%
  mutate(Recreation = Recreation * -1) %>%
  mutate(Food = Food * -1)
d <- highlight_key(fin_mar2022)
p1 <- ggplot(data=d, 
            aes(x = Wage,
                y = Education)) +
  geom_point(size=1) +
  coord_cartesian(xlim=c(0,25000),
                  ylim=c(0,2000)) +
  scale_x_continuous(breaks=c(0,10000,20000))
p2 <- ggplot(data=d, 
            aes(x = Wage,
                y = Food)) +
  geom_point(size=1) +
  coord_cartesian(xlim=c(0,25000),
                  ylim=c(0,2000)) +
  scale_x_continuous(breaks=c(0,10000,20000))
p3 <- ggplot(data=d, 
            aes(x = Wage,
                y = Recreation)) +
  geom_point(size=1) +
  coord_cartesian(xlim=c(0,25000),
                  ylim=c(0,2000)) +
  scale_x_continuous(breaks=c(0,10000,20000))
fig <- subplot(ggplotly(p1),
               ggplotly(p2),
               ggplotly(p3))
annotations <- list(
  list(x = 0.15, y = 0.95,
       text = "Education vs Wage",
       xref = "paper",  
       yref = "paper",
       xanchor = "center",
       yanchor = "bottom",
       showarrow = FALSE),
  list(x = 0.5, y = 0.95,
       text = "Food vs Wage",
       xref = "paper",  
       yref = "paper",
       xanchor = "center",
       yanchor = "bottom",
       showarrow = FALSE),
  list(x = 0.85, y = 0.95,
       text = "Recreation vs Wage",
       xref = "paper",  
       yref = "paper",
       xanchor = "center",
       yanchor = "bottom",
       showarrow = FALSE)
)
fig <- fig %>%
  layout(title = 'Living Costs vs Wage in March 2022',
         annotations = annotations)
fig
fin_may2023 <- fin_wide %>%
  filter(yearmonth == 'May 2023') %>%
  mutate(Education = Education * -1) %>%
  mutate(Recreation = Recreation * -1) %>%
  mutate(Food = Food * -1)
d <- highlight_key(fin_may2023)
p1 <- ggplot(data=d, 
            aes(x = Wage,
                y = Education)) +
  geom_point(size=1) +
  coord_cartesian(xlim=c(0,25000),
                  ylim=c(0,2000)) +
  scale_x_continuous(breaks=c(0,10000,20000))
p2 <- ggplot(data=d, 
            aes(x = Wage,
                y = Food)) +
  geom_point(size=1) +
  coord_cartesian(xlim=c(0,25000),
                  ylim=c(0,2000)) +
  scale_x_continuous(breaks=c(0,10000,20000))
p3 <- ggplot(data=d, 
            aes(x = Wage,
                y = Recreation)) +
  geom_point(size=1) +
  coord_cartesian(xlim=c(0,25000),
                  ylim=c(0,2000)) +
  scale_x_continuous(breaks=c(0,10000,20000))
fig <- subplot(ggplotly(p1),
               ggplotly(p2),
               ggplotly(p3))
annotations <- list(
  list(x = 0.15, y = 0.95,
       text = "Education vs Wage",
       xref = "paper",  
       yref = "paper",
       xanchor = "center",
       yanchor = "bottom",
       showarrow = FALSE),
  list(x = 0.5, y = 0.95,
       text = "Food vs Wage",
       xref = "paper",  
       yref = "paper",
       xanchor = "center",
       yanchor = "bottom",
       showarrow = FALSE),
  list(x = 0.85, y = 0.95,
       text = "Recreation vs Wage",
       xref = "paper",  
       yref = "paper",
       xanchor = "center",
       yanchor = "bottom",
       showarrow = FALSE)
)
fig <- fig %>%
  layout(title = 'Living Costs vs Wage in May 2023',
         annotations = annotations)
fig

5. References

Altman, S., et. al. (2021, September 10). Data Wrangling. https://dcl-wrangle.stanford.edu/pivot-advanced.html#wider-1

Plotly (n.d.). Subplots in R. https://plotly.com/r/subplots/

STHDA. (n.d.). ggplot2 colors : How to change colors automatically and manually?. http://www.sthda.com/english/wiki/ggplot2-colors-how-to-change-colors-automatically-and-manually

Zach. (2021, May 26). How to Calculate the Sum by Group in R (With Examples). Statology. https://www.statology.org/sum-by-group-in-r/